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Abstract 

Various methods have been proposed in the literature to determine an 
optimal partitioning of the actors in a network into core and periphery 
subsets. However, these methods either work only for relatively small 
input sizes, or do not guarantee an optimal answer. In this paper, we 
propose a new algorithm to solve this problem. This algorithm is efficient 
and exact, allowing the optimal partitioning for networks of several thou- 
sand actors to be computed in under a second. We also show that the 
optimal core can be characterized as a set containing the actors with the 
highest degrees in the original network. 

1 Introduction 

A concept that is prevalent in the field of social network analysis is the core/periphery 
model. Such models arise in many fields of research, ranging from corporate 
structure (Barsky, 1999) and world economics (Smith and White, 1992) to sci- 
entific citation networks (Mullins et al, 1977; Doreian, 1985) and Japanese mon- 
keys (Corradino, 1990). 

As discussed in Borgatti and Everett (1999), a discrete core/periphery model 
can be formulated as follows: consider a set of n actors, labelled 1,2, ... ,n, and 
suppose that certain pairs of these actors interact. The idea behind the model 
is that the actors can be partitioned into a cohesive subgraph (a 'core') and a 
loosely-connected 'periphery'. A simple example is a star graph, where the only 
ties that exist are those connecting a distinguished node (1, say) to each of the 
other nodes. Then node 1 forms the core, and the others form the periphery. 

Several algorithms have been suggested for finding an optimal or near- 
optimal decomposition of such a set into its core and peripheral parts. The 
simplest approach is to try all possible subsets as the 'core', and pick the one 



1 DAMTP, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Road, 
Cambridge CB3 OWA, UK. E-mail address: S.Z.W.Lip@damtp.cam.ac.uk 



1 



that works best. However (as noted by Boyd et. al., 2006), there are exponen- 
tially many such subsets, so this becomes infeasible quite rapidly as n increases. 
It therefore appears to be necessary to resort to heuristics, or prune the search 
space in some way. Algorithms based on the former approach include the genetic 
algorithm of Borgatti et. al. (2002) in the UCINET software package, as well 
as algorithms based on simulated annealing and the Kernighan-Lin algorithm, 
considered by Boyd et al (2006). An example of an algorithm which prunes the 
search space can be found in the recent paper of Brusco (2011), which develops 
an exact algorithm based on the branch-and-bound technique that is feasible 
for networks with up to about 60 actors. 

In this light, the main result of this paper might seem surprising: namely, 
that it is possible to solve this problem exactly and efficiently, without resort- 
ing to heuristics or pruning! This is true for both symmetric and asymmetric 
networks. The solutions that will be described in this paper are very fast, and 
therefore easily scalable to large networks. The basis of the algorithm is a greedy 
procedure that systematically picks agents with maximal degree to form part of 
the 'core', and we will also prove that this algorithm gives an optimal solution. 

2 Statement of the Problem 

We adopt a similar formulation to that used in Brusco (2011), and first consider 
the case of symmetric networks (in which Aij = Aji for all i and j). The 
symmetric core/periphery bipartitioning problem is defined as follows: 

• There are n actors, labelled 1,2, ... ,n, and an n x n binary adjacency 
matrix A such that A^ = 1 if actor i interacts with actor j, and A^ = 
otherwise. (We do not consider self-interactions, and assume that, for each 
i, we have An = 0.) . 

• Define S = {1, . . . , n}. We wish to find a proper, non-empty 'core' subset 
Si C S such that the following quantity is minimized: 

Z(S 1 )= ]T V M =o}+ E V«=i> (!) 
(*<j)eSi {i<MSi 

(Here, we have employed the indicator function I{p}, which is equal to 1 
if the predicate P is true, and if P is false.) 

The intuitive idea behind this formulation is that we wish to maximize the num- 
ber of ties between actors in the core, and minimize the number of ties between 
actors in the periphery. In an ideal scenario, there would be ties between every 
pair of actors in the core, and no ties between any pair of actors in the periph- 
ery. Notice that ties between core actors and periphery actors do not appear in 
the expression for Z{S\); this is consistent with the goal of Boyd et al. (2006) 
of finding a bipartition that simultaneously maximizes connectivity in the core 
block and minimizes connectivity in the periphery block. 
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3 The Algorithm 

We now present a simple algorithm that solves the above problem in 0(n 2 ) 
time. Before doing this, however, we pause to make two definitions: 

• The degree of a node i is the number of ties incident to i. We represent 
this quantity by deg(i). It can be seen that deg(i) = X^es a v- 

• Given a node i, and a subset T C S, we define <5t(«) to be the number of 
ties joining i with a node in T. In other words, 5r{i) — J2jeT a ij- 

We now consider a restricted version of the problem, under the assumption that 
the number of actors in the core, Si, is fixed at the outset and is equal to k 
(where 1 ^ k < n). There are therefore hltzll pairs of distinct actors in Si, 
and each pair either has a tie between them, or it does not. So we can write: 

E v«=o>+ E v«=i} = ^ 1 ^- ( 2 ) 

(*<j)€Si (i<j)€Si 

Furthermore, the number of ties contributed by each node i ^ Si to the periph- 
ery set is simply the degree of i, less the number of tics joining i to a node in 
Si. We can therefore write: 

E ^=i} = \ E v y =i> ( 3 ) 

Using these two results, we can express Z(Si) as follows: 

z(Si) = ^ i { a„=o}+ E V y =n 
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(i<j)eSt »^Si 



fc(fc-l) 



^ E + ^ E dc §(*) - ^ E 

ieSi igSi i^Si 



= ^E^w + ^^l-E^w w 

V ies / »eSi 

where the final equality arises because 

E 5 * w = EE 1 {^«=i} -EE v„=i} = E dc sw- ( 5 ) 
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If k is fixed, the terms in the first bracket are independent of the choice of S\ , 
so our problem reduces to finding an S\ of size k such that the final term is 
maximized. Clearly, we should therefore take S\ to consist of the k nodes with 
largest degree in S. This can be done in 0(n log n) time, since it takes 0(n log n) 
time to sort the nodes in descending order of degree using a standard algorithm 
such as merge sort (Knuth, 1998), and a further 0{k) time to construct S\. 

We can now return to the original problem and treat the case in which k 
is unknown. Assume that the nodes are sorted in descending order of degree, 
and that the resulting list of nodes is {vi, v 2 , . . . ,v n }. We can then determine 
the optimal Si by iterating through the possible values of k and calculating the 
optimal Z{S\) for each, and finally taking the best one. 

Note that we need not repeat the calculation from scratch in each iteration, 
because of the following observation: the addition of Vk to the optimal set 
increases the value of Z by 

pj-i) _ £ degM j _ _ g dcgW j _ t-1 _ degW 

(6) 

Initially, the core set is empty, and so the starting value of Z is 

(i<j)es ies 
The full algorithm can therefore be specified as follows: 

1. Calculate and store the degrees of each node. Then sort the nodes in 
descending order of degree, to get a list of nodes {vi, v 2 , ■ • ■ , v n }. 

2. Set Zhcst ■— oo and fcbest := 0. (Note: instead of oo, a suitably large upper 
bound, such as n 2 , can be used.) 

3. SetZ:= lE.degW- 

4. For each k from 1 to n — 1, inclusive: set Z := Z + k — 1 — deg(wfe). Then, 
if Z < Z hcst , set Zbcst := Z and fc bos t := k. 

5. Set Si := {vi, . . . , v kbest }. 

6. Return Si. 

Reading the input (i.e., the adjacency matrix describing the network) takes 
0(n 2 ) time, and so does calculating the degrees of each node. Sorting the nodes 
takes O(nlogn) time, and all other operations take 0(n) time, so the algorithm 
runs in 0(n 2 ) time. If the input data is presented in the form of an adjacency 
list (i.e., as a set of n lists such that the i-th list contains the neighbours of 
the i th actor), or simply as a list of existing ties, the algorithm would run in 
0(n log n + m) time, where m is the number of ties in the network. 
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This algorithm is therefore a significant improvement on both the branch- 
and-bound and the heuristic approaches. The branch-and-bound method pro- 
vides an optimal answer, but is slow; the heuristic approaches do not guarantee 
an optimal answer. The algorithm just described provides an optimal answer, 
and does so quickly. 

As an aside, it is possible to improve the main part of this algorithm further. 
Let Zk be the value of Z after the fc-th iteration of the algorithm. Notice that 
the sequence {k — 1 — deg(vk) : 1 ^ fc < n} is non-decreasing, since the sequence 
{deg(wfc) : 1 ^ k < n} is non-increasing. Therefore, there exists a k* such 
that Z\ ^ Z2 ^ ■ ■ ■ ^ Zk* ^ Zk*+i ^ ... ^ Z n -i. This observation allows 
us to determine the optimum value of k in O(logn) time, by binary searching 
on k to find the largest k* such that k* — 1 — deg(vk») < 0. Once we have 
found this optimal value, we pick our core to be Si = {vi, . . . , as before. 
However, this does not lead to an order-of-magnitudc improvement in the time 
complexity, because, e.g., it still takes 0(n log n) time to sort the nodes at the 
beginning of the algorithm. 

4 Generalization to Asymmetric Networks 

In the version of the problem described by Brusco (2011), the underlying net- 
works were allowed to be symmetric or asymmetric. We now consider the asym- 
metric case. The definition of the matrix A then changes slightly: we now have 
Aij — 1 if there is a tie from actor i to actor j, and — otherwise. The 
objective is now to find a proper subset S\ C S such that 

Z ^) = \ E fv«=o } +i { i,=o}) + ^ E (i{^=i}+v„=i } ) 

(«j)eSi {i<MSi 

(8) 

is minimized. 

To solve this version of the problem, we introduce a symmetric weight func- 
tion w(i,j) = ^(11,4^=1} + I{A jl= i}), f° r an y t w0 nodes i ^ j. Then we can 
write 

z(s 1 )= J2 E Mm)- ( 9 ) 

(i<j)eSi {i<MSi 

Finally, we redefine deg(z) = J2jes w ^^^ anc ^ <^r(i) = X^ e T w (*' ■?')• ^ ^ s now 
straightforward to check that the analysis in Section 3 carries over to this case, 
after we replace I{A i:j =o} with (1— w(i,j)), and t{A ij= i} with w(i,j). Therefore, 
the algorithm in Section 3 still holds (albeit with a modified definition of degree) , 
and its time complexity remains unchanged. 

5 Tests of the Algorithm 

For input graphs with n up to about 1000, the algorithm runs in under a second. 
This could be sped up significantly if the graph is sparse (m n 2 ) and the data 
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is presented in the form of an adjacency list (or a list of ties), since the algorithm 
then takes 0(n log n) time and can therefore handle networks with n up to about 
50000 in under a second. (These estimates are conservative.) 

As a check, the algorithm described in Section 3 was tested, together with 
the brute force algorithm (which tries every possible subset of S as the core and 
is therefore guaranteed to produce the optimal answer), on 100 random input 
cases with 5 ^ n ^ 25. Both algorithms produced the same answer each time, 
and our algorithm is noticeably faster. 

6 Conclusions 

We have presented an exact, efficient algorithm to solve a discrete core/periphery 
bipartitioning problem. This algorithm outperforms both the heuristic and ex- 
haustive search methods that have so far been used, and vastly increases the 
sizes of the problems that can be tackled. 

We also offer the qualitative insight that the actors which make up the core 
are simply the ones with the most connections in the original network. As 
the actors with highest degree are inserted into the core, the size of the core 
increases until it hits a well-defined threshold. Beyond this threshold, it becomes 
less attractive to add new actors to the core because the degrees of the entering 
actors are not large enough to compensate for the core's increasing size. 

Note that this particular formulation of the core/periphery bipartitioning 
problem is solved by choosing the most central nodes to lie in the core, where 
'centrality' in this case is defined as degree centrality. However, other measures 
of centrality are often used (Wasserman and Faust, 1994), and it may be pos- 
sible to formulate alternative definitions of a core/periphery bipartitioning in 
which the optimal solution takes into account the betweenness or closeness cen- 
tralities of the actors. Furthermore, it would be interesting to try and extend 
the algorithm presented in this paper to other variants of the core/periphery 
bipartitioning problem, some of which have continuous (as opposed to discrete) 
formulations. 
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